Open… access
code
peer review
data
Affiliations: 1) Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 2) University of California Los Angeles, USA, 3) University of California Santa Barbara, USA, 4) Arizona State University, Tempe, AZ USA. *Corresponding author: corina_logan@eva.mpg.de
This is the post-study manuscript of the preregistration that was pre-study peer reviewed and received an In Principle Recommendation on 26 Mar 2019 by:
Aurélie Coulon (2019) Can context changes improve behavioral flexibility? Towards a better understanding of species adaptability to environmental changes. Peer Community in Ecology, 100019. 10.24072/pci.ecology.100019. Reviewers: Maxime Dahirel and Andrea Griffin
Preregistration: html, pdf, rmd
Post-study manuscript (submitted to PCI Ecology for post-study peer review on 3 Jan 2022): preprint pdf at EcoEvoRxiv. Revised 15 Aug 2022: pdf at EcoEvoRxiv, html, rmd
Behavioral flexibility, the ability to adapt behavior to new circumstances, is thought to play an important role in a species’ ability to successfully adapt to new environments and expand its geographic range. However, flexibility is rarely directly tested in species in a way that would allow us to determine how flexibility works to predict a species’ ability to adapt their behavior to new environments. We use great-tailed grackles (Quiscalus mexicanus; a bird species) as a model to investigate this question because they have recently rapidly expanded their range into North America. We attempted to manipulate grackle flexibility using colored tube reversal learning to determine whether flexibility is generalizable across contexts (multi-access box), and what learning strategies grackles employ. We found that flexibility was manipulatable: birds in the manipulated group took fewer trials to pass criterion with increasing reversal number, and they reversed a color preference in fewer trials by the end of their serial reversals compared to control birds who had only one reversal. Birds that passed their last reversal faster were also more flexible (faster to switch between loci) and innovative (solved more loci) on a multi-access box. All grackles in the manipulated reversal learning group used one learning strategy (epsilon-decreasing: long exploration period) in all reversals and did not use the epsilon-first strategy: quickly shift their preference), and none used a particular exploration or exploitation strategy earlier or later in their serial reversals. Understanding how flexibility causally relates to other traits will allow researchers to develop robust theory about what flexibility is and when to invoke it as a primary driver in a given context, such as a rapid geographic range expansion.
Behavioral flexibility, the ability to adapt behavior to new circumstances (see Mikhalevich et al., 2017 for the theoretical background on this definition), is thought to play an important role in a species’ ability to successfully adapt to new environments and expand its geographic range (e.g., Lefebvre et al., 1997; Sol et al., 2002, 2005, 2007; Sol & Lefebvre, 2000). The behavioral flexibility (hereafter referred to as flexibility) of individuals is considered an important trait that facilitates the capacity for learning, which is then associated with problem solving ability (applying what one has learned about the world to then attempt to access a resource that is not readily accessible) (see review in Lea et al., 2020). It is hypothesized that, through flexibility, individuals can increase the diversity of their behaviors either via asocial learning (innovativeness) or social learning, leading to the establishment of the population in a new area (Wright et al., 2010).
It is predicted that flexibility should positively relate with innovativeness, the ability to create a new behavior or use an existing behavior in a new situation (Griffin & Guez, 2014). However, these predictions are based on species-level data and proxies for flexibility and for innovation when examining such relationships (see Logan et al., 2018). Flexibility is rarely directly tested in species that are rapidly expanding their geographic ranges in a way that would allow us to determine how flexibility works and predict a species’ ability to adapt their behavior to new areas. Those investigations that examine the relationship between flexibility and innovation [or problem solving - a type of experimental assay that does not necessarily require innovativeness to solve, e.g., the ability to solve tasks using pre-trained behaviors; Griffin & Guez (2014)] in species that are expanding their range show mixed results, with these variables correlating positively (e.g., grey squirrels: Chow et al., 2016), negatively (e.g., Indian mynas: Griffin et al., 2013), or not at all (e.g., stick tool use and string pulling in great-tailed grackles: Logan, 2016).
The first step to improving our understanding of whether and how flexibility relates to innovativeness, and the focus of the current investigation, is to start with one population and perform a manipulative experiment on one of the variables to determine whether there is an associated change in the other. Once this association is known, future research can then investigate whether flexibility and innovativeness are involved in a range expansion. Manipulative experiments go beyond correlations to infer a cause and effect relationship between the manipulated variable and the variable(s) measured after the manipulation (Hernán & Robins, 2006; McElreath, 2020). A manipulative experiment combined with the random assignment of subjects to a condition (manipulated group or control group), eliminates many confounds associated with internal and external variation (for example, season, motivation, sex, and so on). Such manipulative experiments in behavioral ecology have primarily been conducted in laboratory settings because of the increased feasibility, however such experiments are now also being conducted in wild settings (Aplin et al., 2015).
We focused our study on one population of great-tailed grackles (Quiscalus mexicanus, hereafter grackles), a bird species that is flexible (Logan, 2016) and, while they are originally from Central America, they have rapidly expanded their geographic range across the US since 1880 (Summers et al., 2022; Wehtje, 2003). We attempted to manipulate grackle flexibility using serial reversals of a color preference to determine whether their flexibility is generalizable across additional experimental contexts (touchscreen reversal learning and multi-access box solution switching), whether improving flexibility also improves innovativeness (number of loci solved on a multi-access box), and what learning strategies grackles employ (Figure 1).
Reversal learning is a common way of measuring flexibility that has been used for many decades across many species, therefore lending itself well to comparative analyses and generalizations (see review in Lea et al., 2020). In this test, an individual learns to prefer the rewarded option, which differs from the non-rewarded option in color, shape, space, or another obvious feature. Once this initial preference is formed, the previously non-rewarded option becomes the rewarded option and vice versa, and the preference is reversed. Individuals who are faster to reverse their preference are considered more flexible - better able to change their behavior when the circumstances change. Serial reversal learning involves continuing to reverse the preference back and forth to determine whether individuals learn a “win-stay, lose-shift” rule that, when the reward is no longer in the expected option, they should switch to preferring the other option (Spence, 1936; J. Warren, 1965; J. M. Warren, 1965). Once this rule is learned, it can then be applied to new contexts and result in improved performance over individuals who have not learned this rule (J. M. Warren, 1965). We randomly assigned individuals to a manipulated or control condition and used serial reversals (for the manipulated group) to attempt to manipulate flexibility and determine whether the manipulated individuals were then more flexible and more innovative in other contexts.
If grackle flexibility is manipulatable using serial reversals, this would provide us with a useful tool for investigating the relationship between flexibility and any number of other variables implicated in geographic range expansions. It would provide researchers with a way to examine the direct links between, for example, flexibility and exploration, to determine whether they are connected and in which direction, which could provide insights into how populations establish in a new location if cross-population manipulations were conducted. If the flexibility manipulation is not successful, this could indicate either that we did not manipulate the right aspect of flexibility (e.g., perhaps training them to solve a variety of different types of tasks quickly would be more effective) or that grackle flexibility is not a trait that is trainable.
Figure 1. A visual illustration of Hypothesis 1 (A), Hypothesis 2 (B), and Hypothesis 4 (C). Longer black arrows indicate slower reversal times, the two yellow circles represent experience with the two yellow tubes that both contained food for the control group.
Figure 2. The experimental apparatuses: reversal learning using dark gray and light gray tubes or two different shapes on a touchscreen, and the wooden and plastic multi-access boxes (MAB). The wooden MAB has four loci, each containing food and each locus has a distinct way of being opened: lift up flap (A), swing open flap (B), pull out drawer (C), or push in flap (D). The plastic MAB has four loci that all provide access to one piece of food and each locus has a distinct way of being opened: open the window (left side), pull the string (top side), push the shovel (right side), or twist the shovel (bottom side).
Prediction 1: Individuals improve their flexibility on a serial reversal learning task using colored tubes by generally requiring fewer trials to reverse a preference as the number of reversals increases (manipulation condition). Their flexibility on this test is manipulated relative to control birds who do not undergo serial reversals. Instead, individuals in the control condition are matched to manipulated birds for experience (they experience a similar number of trials), but there is no possibility of a functional tube preference because both tubes are the same color (yellow) and both contain food, therefore either choice is correct.
P1 alternative 1: If the number of trials to reverse a preference does not correlate with or positively correlates with reversal number, which would account for all potential correlation outcomes, this suggests that some individuals may prefer to rely on information acquired previously (i.e., they are slow to reverse) rather than relying on current cues (e.g., the food is in a new location) (Griffin & Guez, 2014; Liu et al., 2016; e.g., Manrique et al., 2013; but see Homberg et al., 2007).
P2: Individuals that have improved their flexibility on a serial reversal learning task using colored tubes (requiring fewer trials to reverse a preference as the number of reversals increases) are faster to switch between new methods of solving (latency to solve or attempt to solve a new way of accessing the food [locus]), and learn more new loci (higher total number of solved loci) on multi-access box flexibility tasks, and are faster to reverse preferences in a serial reversal task using a touchscreen than individuals in the control group where flexibility has not been manipulated. The positive correlation between reversal learning performance using colored tubes and a touchscreen (faster birds have fewer trials) and the multi-access boxes (faster birds have lower latencies) indicates that all three tests measure the same ability even though the multi-access boxes require inventing new rules to solve new loci (while potentially learning a rule about switching: “when an option becomes non-functional, try a different option”) while reversal learning requires switching between two rules (“choose light gray” or “choose dark gray”) or learning the rule to “switch when the previously rewarded option no longer contains a reward”. Serial reversals eliminate the confounds of exploration, inhibition, and persistence in explaining reversal learning speed because, after multiple reversals, what is being measured is the ability to learn one or more rules. If the manipulation works, this indicates that flexibility can be influenced by previous experience and might indicate that any individual has the potential to move into new environments (see relevant hypotheses in preregistrations on genetics (R1) and expansion (H1)).
P2 alternative 1: If the manipulation does not work in that those individuals in the experimental condition do not decrease their reversal speeds more than control individuals, then this experiment elucidates whether general individual variation in flexibility relates to flexibility in new contexts (two distinct multi-access boxes and serial reversals on a touchscreen) as well as problem solving ability (multi-access boxes). The prediction is the same as in P2, but in this case variation in flexibility is constrained by traits inherent to the individual (some of which will be tested in McCune KB et al., 2019), which suggests that certain individuals will be more likely to move into new environments.
P2 alternative 2: If there is no correlation between reversal learning speed (colored tubes) and the latency to solve/attempt a new locus on the multi-access boxes, this could be because the latency to solve not only measures flexibility but also innovativeness. In this case, an additional analysis is run with the latency to solve as the response variable, to determine whether the fit of the model (as determined by the lower AIC value) with reversal learning as an explanatory variable is improved if motor diversity (the number of different motor actions used when attempting to solve the multi-access box) is included as an explanatory variable (see Diquelou et al., 2015; Griffin et al., 2016). If the inclusion of motor diversity improves the model fit, then this indicates that the latency to solve a new locus on the multi-access box is influenced by flexibility (reversal learning speed) and innovation (motor diversity).
P2 alternative 3: If there is a negative correlation or no correlation between reversal learning speed on colored tubes and reversal learning speed on the touchscreen, then this indicates that it may be difficult for individuals to perceive and/or understand images on the touchscreen in contrast with physical objects (colored tubes) (e.g., O’Hara et al., 2015).
P4: Individuals prefer a mixture of learning strategies in the first serial reversals (an epsilon-decreasing strategy where individuals explore both options extensively before learning to prefer the rewarded option, and an epsilon-first strategy where the correct choice is consistently made after the first trial), and then move toward the epsilon-first learning strategy. The epsilon-first strategy works better later in the serial reversals where the reward is all or nothing because individuals have learned the environment is changing in predictable ways (Bergstrom & Lachmann, 2004): only one option is consistently rewarded, and if the reward isn’t in the previously rewarded option, it must be in the other option.
P4 alternative 1: Individuals continue to prefer a mixture of learning strategies, and/or they do not converge on the more functional epsilon-first learning strategy, regardless of how many reversals they participate in. This pattern could suggest that the grackles do not attend to functional meta-strategies, that is, they do not learn the overarching rule (once food is found in the non-preferred tube, one must switch to preferring that tube color), but rather they learn each preference change as if it was new.
Please see our preregistration that received in principle acceptance at PCI Ecology (PDF version) for all of the preregistered methods. Below, we include a summary and describe all deviations from the preregistration. We present the results from different hypotheses in separate articles: this one, K. McCune et al. (2022), and Lukas et al. (2022).
Great-tailed grackles were caught in the wild in Tempe, Arizona, USA for individual identification (colored leg bands in unique combinations). Some individuals (~32: ~16 in the control group (they receive 1 reversal) and ~16 in the flexibility manipulation (they receive multiple reversals)) were brought temporarily into aviaries for testing, and then released back to the wild.
We stopped testing birds after we completed two full aviary seasons because the sample size was above the minimum suggested boundary of 15 (to detect a medium effect size) based on model simulations (see Supplementary Material 6).
Design files for the plastic multi-access box: 3D printer files and laser cutter files
Testing protocols for all three experiments: colored tube reversal learning, plastic multi-access box, wooden multi-access box, and touchscreen reversal learning
The data are available at the Knowledge Network for Biocomplexity’s data repository: https://knb.ecoinformatics.org/view/corina_logan.84.42.
H1: Subjects were randomly assigned to the manipulated or control group. In the reversal learning trials, the rewarded option is pseudorandomized for side (and the option on the left is always placed first). Pseudorandomization consisted of alternating location for the first two trials of a session and then keeping the same color on the same side for at most two consecutive trials thereafter. A list of all 88 unique trial sequences for a 10-trial session, following the pseudorandomization rules, was generated in advance for experimenters to use during testing (e.g., a randomized trial sequence might look like: LRLLRRLRLR, where L and R refer to the location, left or right, of the rewarded tube). Randomized trial sequences were assigned randomly to any given 10-trial session using a random number generator (random.org) to generate a number from 1-88. The only exception to this randomization was when an individual exhibited a side bias (choosing one side 4 or more trials in a row). In these cases, we stopped the current random numbers for side and started putting the rewarded color on the non-preferred side as much as possible while still following the pseudorandomization rules until the individual stopped exhibiting a side bias.
Analyses were conducted in R [current version 4.1.2; R Core Team (2017)], using several R packages: Zhu (2021), Hlavac (2018), Hadfield (2010), Bartoń (2020), McElreath (2020), Stan Development Team (2020), Xie (2019), Ushey et al. (2020), Eddelbuettel & François (2011), Wickham (2016), knitr (Xie, 2013, 2017, 2018), Wickham et al. (2021), Gabry & Češnovar (2021), posterior (Bürkner et al., 2020), cowplot (Wilke, n.d.), bayesplot (Gabry et al., 2019), irr (Gamer et al., 2012), psych (Revelle, 2014, 2017), Lin (2020), DHARMa (Hartig, 2019), lme4 (Bates et al., 2012; Bates et al., 2015).
Unregistered analyses: We conducted unregistered interobserver reliability analyses on the response variables. Scores indicated that the response variables are repeatable to a high or extremely high degree given our instructions and training (see Supplementary Material 5).
Planned analyses: When there is more than one experimenter within a test, experimenter will be added as a random effect to account for potential differences between experimenters in conducting the tests. If there are no differences between models including or excluding experimenter as a random effect, then we will use the model without this random effect for simplicity.
The data were checked for overdispersion, underdispersion, zero-inflation, and heteroscedasticity with the DHARMa R package (Hartig, 2019) following methods by Hartig. Note: DHARMa doesn’t support MCMCglmm, therefore we will use the closest supported model: glmer from the R package lme4 (Bates et al., 2015) for the DHARMa data checking.
The plan: We initially (in 2017) set as the passing criterion: During the data collection period, the number of trials required to reverse a preference will be documented per bird, and reversals will continue until the first batch of birds tested reaches an asymptote (i.e., there are negligible further decreases in the number of trials required to reverse a preference). The number of reversals to reach the asymptote will be the number of reversals that subsequent birds experience.
Choice criterion: At the beginning of the second bird’s initial discrimination in the reversal learning colored tube experiment (October 2018), we revised the criterion for what counts as a choice from A) the bird’s head needs to pass an invisible line on the table that ran perpendicular to the the tube opening to B) the bird needs to bend its body or head down to look in the tube. Criterion A resulted in birds making more choices than the number of learning opportunities they were exposed to (because they could not see whether there was food in the tube unless they bent their head down to look in the tube) and appeared to result in slower learning. It is important that one choice equals one learning opportunity, therefore we revised the choice criterion to the latter. Anecdotally, this choice matters because the first three birds in the experiment (Tomatillo, Chalupa, and Queso) learned faster than the pilot birds (Empanada and Fajita) in their initial discriminations and first reversals. Thus, it was an important change to make at the beginning of the experiment (after testing the two pilot birds and before collecting any data that were included in analyses).
Criterion to pass the control condition: Before collecting experimental data, we set the number of trials experienced by the birds in the control group as 1100 because this is how many trials it would have taken the pilot bird in the manipulated group, Fajita, to pass serial reversals 2-17 according to our revised serial reversal passing criterion. However, after 25 and 17 days (after Tomatillo and Queso’s first reversals, respectively) of testing the first two individuals in the control group, it became apparent that 1100 trials is impractical given the time constraints for how long we were permitted to keep each bird temporarily in captivity and would prevent birds from completing the test battery before their release. Additionally, after revising the choice criterion, it was going to be likely that birds in the manipulated group would require fewer than 1100 trials to meet the serial reversal passing criterion. Therefore, reducing the number of trials the control birds experience would result in a better match of experience with birds in the manipulated group. On 2 November 2018 we set the number of trials control birds experience after their first (and only) reversal to the number of trials it requires the first bird in the manipulated group to pass (the first bird has not passed yet, therefore we do not yet know what this number is). After more individuals in the manipulated group passed, we updated this number to the average number of trials to pass. This applied to all birds in the control condition, except Mofongo (see next paragraph).
Analysis: Response variable: Number of trials to reverse a preference. An individual is considered to have a preference if it chose the rewarded option at least 17 out of the most recent 20 trials (with a minimum of 8 or 9 correct choices out of 10 on the two most recent sets of 10 trials). We use a sliding window to look at the most recent 10 trials for a bird, regardless of when the testing sessions occurred. Explanatory variable: reversal number. Random variables: batch (random effect because multiple batches included in the analysis; batch is a test cohort, consisting of 8 birds being tested simultaneously) and ID (random effect because repeated measures on the same individuals). A Generalized Linear Mixed Model [GLMM; MCMCglmm function, MCMCglmm package; Hadfield (2010)] will be used with a Poisson distribution and log link using 13,000 iterations with a thinning interval of 10, a burnin of 3,000, and minimal priors (V=1, nu=0) (Hadfield, 2014). We will ensure the GLMM shows acceptable convergence [lag time autocorrelation values <0.01; Hadfield (2010)], and adjust parameters if necessary. We will determine whether an independent variable had an effect or not using the Estimate in the full model.
We did not need a power analysis to estimate our ability to detect actual effects because, by definition, the individuals that complete this experiment must get faster at reversing in order to be able to pass the stopping criterion (two consecutive reversals in 50 trials or less). According to previous grackle data (from the pilot birds, and from Santa Barbara Logan, 2016), the fastest grackle passed their first reversal in 70 trials, which means that passing our serial reversal stopping criterion would require them to have improved their passing speed.
Unregistered analysis: Because the wooden multi-access box was added after in principle recommendation, we conducted an unregistered analysis to determine whether the plastic and wooden multi-access box results correlated with each other, which would indicate that these tests are interchangeable. We found that they did not statistically significantly correlate with each other on either variable measured: the average latency to attempt a new locus (switching; Pearson’s r=0.74, 89% confidence level=0.02-0.95, t=2.18, df=4, p=0.09, n=6) or the total number of loci solved (problem solving; Pearson’s r=0.51, 89% confidence level=0.03-0.80, t=1.86, df=10, p=0.09, n=12). Therefore, while the performance on the two multi-access boxes might not be completely independent as indicated by the high r values, the two boxes appear not to be completely interchangeable either as indicated by the lack of statistical significance and high uncertainty in the r values. We therefore analyzed the plastic and wooden multi-access boxes separately.
Planned analyses: As originally planned, we replaced the GLMs and GLMMs in May 2020 with more powerful models after learning how to make bespoke Bayesian models from McElreath (2016). We made these models before analyzing the actual data (14 May 2020).
One model was run per response variable: average latency to attempt to solve a new locus after solving a different locus, and total number of loci solved. Explanatory variable: Number of trials to reverse a preference in the last reversal. Random variable: batch.
The model for the number of loci solved takes the form of:
locisolved ~ Binomial(4, p) [likelihood]
logit(p) ~ \(\alpha\)[batch] + \(\beta\)trials [model]
locisolved is the number of loci solved on the multi-access box, 4 is the total number of loci on the multi-access box, p is the probability of solving any one locus across the whole experiment, \(\alpha\) is the intercept and each batch gets its own, \(\beta\) is the expected amount of change in locisolved for every one unit change in trials, and trials is the number of trials to reverse a color preference. See Supplementary Material 3 for more model details.
The model for the latency to switch options takes the form of:
latency ~ gamma-Poisson(\(\lambda_i\), \(\phi\)) [likelihood]
log(\(\lambda_i\)) ~ \(\alpha\)[batch] + \(\beta\)trials [model]
latency is the average latency to attempt a new locus on the multi-access box, \(\lambda_i\) is the rate (probability of attempting a locus in each second) per bird (and we take the log of it to make sure it is always positive; birds with a higher rate have a smaller latency), \(\phi\) is the dispersion of the rates across birds, \(\alpha\) is the intercept for the rate per batch, \(\beta\) is the expected amount of change in the rate of attempting to solve in any given second for every one unit change in trials, and trials is the number of trials to reverse a color preference. See Supplementary Material 6 for more model details.
Deviations from the plan:
Planned analysis: We ran one model per response variable: Number of trials to attempt a new locus on the multi-access boxes, and number of trials to solve (meet criterion) a new locus on the multi-access boxes. Explanatory variables: Number of trials to reverse a preference in the last reversal that individual participated in, motor diversity: the number of different motor actions used when attempting to solve the multi-access boxes. Random variable: ID (random because repeated measures on the same individuals). A Generalized Linear Mixed Model [GLMM; MCMCglmm function, MCMCglmm package; Hadfield (2010)] will be used with a Poisson distribution and log link using 13,000 iterations with a thinning interval of 10, a burnin of 3,000, and minimal priors (V=1, nu=0) (Hadfield, 2014). We ensured the GLMM showed acceptable convergence [lag time autocorrelation values <0.01; Hadfield (2010)] by adjusting parameters if necessary. We determined whether an independent variable had an effect or not using the Estimate in the full model.
Analysis 1 (qualitative): Learning strategies were identified by matching them to the two known approximate strategies of the contextual, binary multi-armed bandit: epsilon-first and epsilon-decreasing (McInerney, 2010; as in Logan, 2016). We used the criterion for the epsilon-first strategy of learning the correct choice after one trial and then choosing correctly thereafter. Other patterns were classified as the epsilon-decreasing strategy. This method of qualitative inspection of learning curves is standard for this type of learning strategy assessment (McInerney, 2010). The variable for visual inspection was the proportion of correct choices in a non-overlapping sliding window of 4-trial bins across the total number of trials required to reach the criterion of 17/20 correct choices per individual.
From Logan (2016) (emphasis added):
The following equations refer to the different phases involved in each strategy:
Equation 1 (exploration phase): \[\epsilon N\]
Equation 2 (exploitation phase): \[ ( 1 - \epsilon ) N \]
N is the number of trials given, and epsilon, \[\epsilon\], represents the subject’s uncertainty about the location of the reward, starting at complete uncertainty (\(\epsilon\) = 1) at the beginning of the experiment and decreasing rapidly as individuals gain experience with the task (exploration phase where the rewarded [option] is chosen below or at chance levels) and switch to the exploitative phase (the rewarded [option] is chosen significantly above chance levels). Because the [subjects] needed to learn the rules of the task, they necessarily had an exploration phase. The epsilon-first strategy involves an exploration phase followed by an entirely exploitative phase. The optimal strategy overall would be to explore one color in the first trial and the other color in the second trial, and then switch to an exploitative strategy (choose the rewarded [option] significantly above chance levels). In this case there would be no pattern [in the learning curve] in the choices [during] the exploration phase because it would consist of sampling each [option] only once. In the epsilon-decreasing strategy, subjects would start by making some incorrect choices and then increase their choice of the rewarded [option] gradually as their uncertainty decreases until they choose the rewarded [option] significantly above chance levels. In this case, a linear pattern emerges (in the learning curve) during the exploration phase.
Analysis 2 (quantitative): We then quantitatively determined to what degree each bird used the exploration versus exploitation strategy using methods in (Federspiel et al., 2017) by calculating the number of 20-trial blocks where birds were choosing “randomly” (6-14 correct choices; called sampling blocks; akin to the exploration phase above) and dividing it by the total number of blocks to reach criterion per bird. This ratio was also calculated for “acquisition” blocks where birds made primarily correct choices (15-20 correct choices; akin to the exploitation phase above). These ratios, calculated for each bird for their serial reversals, quantitatively discern the exploration from the exploitation phases.
In the middle of data collection
10 April 2019: We discontinued the reversal learning experiment on the touchscreen because it appears to measure something other than what we intended to test and it requires a huge time investment for each bird (which consequently reduces the number of other tests they are available to participate in). This is not necessarily surprising because this is the first time touchscreen tests have been conducted in this species, and also the first time (to our knowledge) this particular reversal experiment has been conducted on a touchscreen with birds. We based this decision on data from four grackles (2 in the flexibility manipulation group and 2 in the flexibility control group; 3 males and 1 female). All four of these individuals showed highly inconsistent learning curves and required hundreds more trials to form each preference when compared to the performance of these individuals on the colored tube reversal experiment. It appears that there is a confounding variable with the touchscreen such that they are extremely slow to learn a preference as indicated by passing our criterion of 17 correct trials out of the most recent 20. We will not include the data from this experiment when conducting the cross-test comparisons in the Analysis Plan section of the preregistration. Instead, in the Results section, we provide summary results for this experiment and, in the Discussion, qualitatively compare it with performance on the colored tube reversal test to explain what might have confounded the touchscreen experiment.
16 April 2019: Because we discontinued the touchscreen reversal learning experiment, we added an additional but distinct multi-access box task, which allowed us to continue to measure flexibility across three different experiments. There are two main differences between the first multi-access box, which is made of plastic, and the new multi-access box, which is made of wood. First, the wooden multi-access box is a natural log in which we carved out 4 compartments. As a result, the apparatus and solving options are more comparable to what grackles experience in the wild, though each compartment is covered by a transparent plastic door that requires different behaviors to open. Furthermore, there is only one food item available in the plastic multi-access box and the bird could use any of 4 loci to reach it. In contrast, the wooden multi-access box has a piece of food in each of the 4 separate compartments.
Post data collection, pre-data analysis
We completed our simulation to explore the lower boundary of a minimum sample size and determined that our sample size for the Arizona study site is above the minimum (see details and code in Supplementary Material 1; 17 April 2020).
Please see our Alternative Analyses section in the preregistration where we stated that we would learn and implement Bayesian models, which resulted in our changing the analysis for P2 and that we are replacing this analysis with the new models in the Ability to detect actual effects section (Supplementary Material 1; 14 May 2020). We also describe in SM1 that we realized that Condition (manipulated or control) does not need to be a variable in our models because the manipulated birds have, by definition, faster reversal speeds.
We originally planned on testing only adults to have a better understanding of what the species is capable of, assuming the abilities we are testing are at their optimal levels in adulthood, and so we could increase our statistical power by eliminating the need to include age as an independent variable in the models. Because the grackles in Arizona were extremely difficult to catch, we ended up testing two juveniles: Taco and Chilaquile. We did not conduct the full test battery with Taco or put him in the flexibility manipulation or control groups (he received 1 reversal and then moved on to the next test) because he was the first juvenile and we wanted to see whether his performance was different from adult performances. His performances were similar to the adults, therefore we decided to put Chilaquile in the full test battery. Chilaquile’s performances were also similar to the adults, therefore we decided not to add age as an independent variable in the models to avoid reducing our statistical power.
Post data collection, mid-data analysis
Data are publicly available at the Knowledge Network for Biocomplexity (C. Logan et al., 2022). Although 22 grackles completed their initial colored tube discrimination, only 20 grackles participated in one or more reversals (Table SM5). The rest of the tests began only after a bird’s reversal experiment was complete (C. Logan et al., 2022).
The birds in the manipulated group required a similar number of trials during their first reversal (R1 median=75 trials) as the birds in the control group needed during their first and only reversal (R1 median=70 trials) (see unregistered analysis in Table 1). The manipulated birds improved during the reversal manipulation to a median of 40 trials in their last reversal: there was a significant negative correlation between the number of trials to reverse (average=71 trials, standard deviation (sd)=28, Table 2) and the reversal number for those grackles in the flexibility manipulation condition (n=9, which included Memela who did not pass the manipulation condition of passing two consecutive reversals in 50 trials or less; Figure 3).
Table 1. Unregistered analysis: the number of trials to reverse in the first reversal is similar between the manipulated and control groups.
| Posterior mean | Lower 89 percentile compatability interval (5.5%) | Upper 89 percentile compatability interval (94.5%) | Effective sample size | pMCMC | Significance code: **=0.01 | |
|---|---|---|---|---|---|---|
| Intercept | 4.29 | 4.12 | 4.46 | 420 | <0.002 | ** |
| Manipulation Condition | -0.08 | -0.27 | 0.11 | 420 | 0.46 |
Table 2. The number of trials to reverse decreases with increasing reversal number.
| Posterior mean | Lower 89 percentile compatibility interval (5.5%) | Upper 89 percentile compatibility interval (94.5%) | Effective sample size | pMCMC | Significance code: **=0.01 | |
|---|---|---|---|---|---|---|
| Intercept | 4.44 | 4.31 | 4.62 | 420 | <0.002 | ** |
| Reverse Number | -0.06 | -0.10 | -0.03 | 420 | <0.002 | ** |
Figure 3. Individuals in the manipulated condition (who received serial reversals) did not linearly decrease their reversal passing speeds with increasing reversal number (n=9 grackles).
Unregistered analysis 1: There was additionally a difference between manipulated and control reversal speeds when comparing their last reversals (Figure 4; for the control birds, their last reversal was their first reversal; Table 3). This analysis includes 19 grackles (8 manipulated condition - only those who actually passed the manipulation, 11 control condition) who had an overall average of 62 trials in their last reversal (sd=32).
Figure 4. Individuals in the manipulated condition (who received serial reversals) passed their last reversal in fewer trials than individuals in the control condition (who only received 1 reversal). n=19 grackles: 11=control, 8=manipulated.
Table 3. Individuals in the manipulated condition pass their last reversal in fewer trials than control individuals.
| Posterior mean | Lower 89 percentile compatability interval (5.5%) | Upper 89 percentile compatability interval (94.5%) | Effective sample size | pMCMC | Significance code: **=0.01 | |
|---|---|---|---|---|---|---|
| Intercept | 4.28 | 4.08 | 4.48 | 420 | <0.002 | ** |
| Reverse Number | -0.51 | -0.81 | -0.22 | 420 | 0.010 | ** |
Unregistered analysis 2: A pooled model of performance across all reversals estimates that birds can expect to improve by about 30 trials (89% percentile interval (PI): 25-36; Table 7: Model 15) after completing the serial reversals. While all manipulated birds improved, those birds that were already fast to reverse in their first reversal improved less than the birds that required many trials to reverse in their first reversal (posterior peak indicates a correlation of +0.64, with highest posterior density intervals (HPDI) all positive, between the first reversal value and the improvement achieved by the last reversal; Table SM3: Model 16). However, the birds who were the fastest in the first reversal, were also the fastest in the last reversal, but the difference between the slower and faster reversers is reduced (Figure 5).
Figure 5. All eight manipulated birds needed fewer trials to reverse in their last reversal than in their first. Their improvement depended on their starting value, with steeper slopes for those birds that needed more trials to reverse in the first reversal (blue = observed values and changes, black = model estimates). However, birds who needed more trials in the first reversal did not completely catch up, such that the birds that needed more trials in their first reversal also needed more trials in their last reversal relative to other grackles.
To determine whether the serial reversal manipulation affected flexibility generally, we compared performance (the number of trials to reverse a preference in the first and last color reversal, performance of the manipulated group relative to the control group) to speed of solution switching on two multi-access boxes. Furthermore, we assessed whether flexibility measured through these serial reversals related to innovativeness by comparing performance to the number of loci solved on the multi-access boxes. The results for each of these comparisons are described in detail below and an overview is provided in Figure 6.
Figure 6. Overview of the results from the P2 analyses with the multi-access boxes (plastic and wooden). An effect of natural variation in flexibility on performance on the multi-access box tasks would result in correlations in the first reversal. An effect of the flexibility manipulation would result in a change in correlations from the first to last reversals. A plus sign (+) indicates a positive correlation, a minus sign (-) indicates a negative correlation, and a 0 indicates no correlation between the two variables. The asterisks (*) indicate that a small sample size decreases the reliability of this result.
Grackles that were faster to reverse a preference in their last
reversal (average 52 trials, sd=23), where grackles in the
control condition received only one reversal which served as their first
and last reversal, were also faster to attempt to solve a new locus on
the plastic multi-access box (after just having passed criterion on a
different locus; average=208 seconds, sd=226; Figure 7a; Table SM3:
Model 9; n=11 grackles: 6 in manipulated condition, 5 in control
condition; 6 subjects completed this experiment but solved 0 loci or 1
locus and so did not have switching times). We also found that
individuals in the flexibility manipulation had faster switch latencies
than those in the control condition (Table SM3: Model 10). There was a
positive correlation between the number of trials to reverse in the
first reversal (average=70 trials, sd=21) and the
average switch latency on the plastic multi-access box (Table SM3: Model
11). A correlation was determined to be present if the compatibility
interval for the slope (b) in the model output did not cross zero (Table
SM3). This criterion was used throughout the analyses for P2.
Figure 7. The average latency (seconds) to attempt to solve a different locus after having previously successfully solved a locus on a) the plastic multi-access box (MAB) is positively correlated with the number of trials to pass their last reversal (n = 11 grackles), but on b) the wooden MAB it is not correlated with the number of trials to pass their last reversal (n = 11 grackles). Additionally, the probability of solving a locus on c) the plastic MAB is negatively correlated with the number of trials to pass their last reversal (n = 15 grackles), but on d) the wooden MAB it is not correlated with the number of trials to pass their last reversal (n = 12 grackles, estimate of slope includes zero). Shading represents the 89 percentile compatibility intervals.
There was no correlation between the number of trials to reverse a
preference in their last reversal (average 60 trials,
sd=38) and the latency to attempt to solve a new locus on the wooden
multi-access box (after just having passed criterion on a different
locus; average=463 seconds, sd=481; Figure 7b; Table SM3: Model 12; n=11
grackles: 5 in manipulated condition, 6 in control condition; Diablo
also completed this experiment and solved 1 locus, but did not attempt
another locus after that, thus he does not have any switching times to
analyze). We additionally found that there was no difference in the
average latency to switch between individuals in the flexibility
manipulation and those in the control condition (Table SM3: Model 13).
There was a negative correlation between the number of trials to reverse
in the first reversal (average=73 trials, sd=34) and
the average switch latency on the multi-access box (Table SM3: Model
14).
Grackles that were faster to reverse a preference in their last reversal (average 62 trials, sd=34) solved more loci on the plastic multi-access box (average=2 loci, sd=1.6; Figure 7c; Table SM3: Model 2; n=15 grackles: 6 in manipulated condition, 9 in control condition; this number excludes Mole and Habanero who were, due to experimenter error, given the fully put together box during habituation and could have learned how to solve the loci at that time). There was no correlation between the number of loci solved and which reversal condition a grackle was randomly assigned to (Table SM3: Model 4). There was also no correlation between the number of trials to reverse in the first reversal (average=75 trials, sd=31) and the number of loci solved on the multi-access box (Table SM3: Model 5).
The compatibility interval for the estimate for the association (mean
beta -0.41) between the number of loci solved on the wooden multi-access
box (average=3.2, sd=1.3) and the number of trials to reverse a
preference in their last reversal (average=59 trials,
sd=38) crossed zero (Figure 5d; Model 6, Table SM3; n=12 grackles: 6 in
manipulated condition, 6 in control condition). This could mean that
there is no association, however simulations in Supplementary Material 1
showed that we would not be able to reliably distinguish whether a small
effect is different from zero with our sample size (with a simulated
beta of -1 and an sd in the number of trials >10, the compatibility
interval of the estimate crossed zero in all simulations ; Table SM1.2).
We did find a correlation between the number of loci solved and which
reversal condition a grackle was randomly assigned to, indicating the
reversal manipulation appears to have affected performance on the wooden
multi-access box. The model estimates that manipulated birds solved on
average 1.2 more loci than birds in the control condition (Table SM3:
Model 7, wooden; 89% compatibility intervals=0.34-2.14; n=12 grackles: 6
in manipulated condition, 6 in control condition). However, there is no
association between the number of trials to reverse in the first
reversal (average=74 trials, sd=34) and the number of loci solved on the
multi-access box (Table SM3: Model 8, wooden).
Because there was no correlation between the number of trials to reverse in the last reversal and the latency to attempt a different locus on the wooden multi-access box, we conducted this additional analysis to determine whether the model fit was improved when adding the number of motor actions as an explanatory variable. Adding the number of motor actions (wooden: average=13, sd=4) did not improve the model fit when examining the relationship between the latency to switch loci on the wooden multi-access box (wooden: average=463, sd=481) and the number of trials to reverse in the last reversal (wooden: average=60, sd=38) because the Akaike weights were similar for both models (wooden: n=11 grackles: 5 in the manipulated group, 6 in the control group; Table 4).
Table 4. Adding the number of motor actions used to the analysis of the average latency to attempt a new option on the wooden multi-access box and the number of trials to reverse in the last reversal does not improve the model fit.
| Intercept | Motor actions (wooden) | Trials last reversal | df | log likelihood | AICc | delta | weight |
|---|---|---|---|---|---|---|---|
| 463.2 | NA | NA | 2 | -83.025 | 171.6 | 0.00 | 0.674 |
| 934.6 | -35.28 | NA | 3 | -82.477 | 174.4 | 2.83 | 0.164 |
| 665.8 | NA | -3.362 | 3 | -82.631 | 174.7 | 3.14 | 0.140 |
| 1250.0 | -40.68 | -4.040 | 4 | -81.850 | 178.4 | 6.82 | 0.022 |
Analysis 1 (qualitative): Using the criterion for the epsilon-first strategy of learning the correct choice after one trial and then choosing correctly thereafter, no grackle in this study used this strategy in any reversal. All grackles used an epsilon-decreasing strategy in all reversals (Figure 8 and Supplementary Material 6). We use Burrito’s figures to illustrate the epsilon-decreasing strategy (Figure 8): the proportion of trials he gets correct wanders up and down (epsilon-decreasing) until an asymptote at 0.8 is reached and held.
Figure 8. Burrito’s proportion of trials correct by trial number and reversal showing the epsilon-decreasing learning strategy where options are explored before forming a preference.
Analysis 2 (quantitative): We additionally quantitatively determined to what degree each bird used the exploration versus exploitation strategy using methods in Federspiel et al. (2017) by calculating the number of 10-trial blocks where birds were choosing “randomly” (2-9 correct choices; called sampling blocks; akin to the exploration strategy) divided by the total number of blocks to reach criterion per bird. This ratio was also calculated for “acquisition” blocks where birds made primarily correct choices (9-10 correct choices; akin to the exploitation strategy). There was no correlation between exploration (sampling ratio) or exploitation (acquisition ratio) and reversal number (sampling: reversal estimate=-0.09, SE=0.11, z=-0.86, p=0.39; acquisition: reversal estimate=0.00, SE=0.00, z=-0, p=1.00), indicating that the grackles did not use a particular strategy earlier or later in their serial reversals.
We conducted a controlled experiment to evaluate whether serial reversal learning affected flexibility and innovativeness in new contexts. We found that the number of trials to reverse decreased with increasing reversal number, and, when examining last reversals, there was a difference between the manipulated and control groups. This indicates that the flexibility manipulation was effective in that it manipulated reversal learning speeds, suggesting that these individuals shifted toward a “win-stay, lose-shift” rule to learn to reverse faster after more experience with reversing (Spence, 1936; J. Warren, 1965; J. M. Warren, 1965). The manipulated individuals who increased their reversal learning speed, were then apparently able to apply this to a new context, which resulted in better performance when compared with control individuals who did not have the opportunity to learn. Previous research has also exploited the fact that most individuals can learn to learn and have used serial reversals to show that such experience usually improves performance when transferring to reversals involving different stimuli (e.g., visual vs. spatial, visual vs. visual in a new combination) (Rayburn-Reeves et al., 2013; Schusterman, 1962; J. Warren, 1965, 1966).
While performance differed between the two multi-access boxes, the serial reversal flexibility manipulation did affect flexibility in a new context, as well as innovativeness. Grackles that were faster to reverse a preference in their first and last reversals, and those in the manipulated condition, were also faster to attempt to solve a new locus on the plastic multi-access box. Similarly, the flexibility manipulation affected innovativeness because grackles in the manipulated condition solved on average 1.2 more loci on the wooden multi-access box than those birds in the control condition and there was a positive correlation between the number of loci solved on the plastic multi-access box and the number of trials to reverse in the last reversal. That our results were not consistent across first reversal, last reversal, and condition (Figure 4) on the two different multi-access boxes could be due to the small sample sizes because even in the control group there were several individuals who solved their first and only reversal in very few trials. Furthermore, the lack of correlation between the number of trials to reverse in the first reversal and the number of loci solved on either multi-access box indicates that flexibility is not an inherently utilized tool, but one that is shaped by experience. If it was an inherently utilized tool, the variation in the number of trials to complete first reversals would likely have resulted in a correlation with the number of loci solved.
Our results are in contrast with previous research on the correlation between flexibility performance, using serial reversals, and innovation: Indian mynas that were faster to reverse, were slower to innovate (Griffin et al., 2013). However, the Griffin et al. (2013) investigation was designed to evaluate the correlation between the variables and not whether manipulating flexibility using serial reversals influenced innovativeness. This difference could explain the differing results because correlational research can become noisy if there are unmeasured variables, which is something that a manipulation can help reduce. Other potential reasons for the difference in results could be due to using different experimental designs, and/or different serial reversal passing criteria (Griffin et al., 2013 used a preset number of reversals that resulted in a maximum of four reversals).
None of the flexibility manipulated individuals converged on using an epsilon-first learning strategy (learn the correct choice after one trial) as they progressed through serial reversals. All used the epsilon-decreasing strategy (explore options before forming a preference) throughout their reversals. Additionally, no grackle used a particular exploitation or exploration strategy earlier or later in their reversals. Learning theory on serial reversal experiments predicts that all individuals in the manipulated group shifted toward the “win-stay, lose-shift” rule because their reversal speeds improved (Spence, 1936; J. Warren, 1965; J. M. Warren, 1965). In contrast, learning theory on multi-armed bandit (a paradigm often used in reversal learning) decision making has a stricter criterion, predicting that the optimal strategy is to maximize the cumulative reward, which, in this case would result in individuals using the epsilon-first learning strategy immediately after the first trial (McInerney, 2010). Both learning theories consider one trial learning the optimal solution. Perhaps these wild-caught grackles relied solely on the epsilon-decreasing strategy because these individuals are used to an environment where information about the probability of what the optimal options are varies (McInerney, 2010). Therefore, maximizing information gain via continued exploration of the available options is likely of more use in the less predictable environment in the wild. Other investigations of the exploitation vs. exploration learning strategies involved in reversal learning have found that these strategies can vary by individual and relate to differences in reversal performance. For example, urban common mynas were slower to reverse a preference than rural mynas because they spent more time exploring their options (Federspiel et al., 2017). Perhaps we found no such differences in the grackles because all of the individuals we tested came from an urban area. If a rural population of grackles could be found, it would be interesting to compare learning strategy use between rural and urban individuals.
We assumed that reversal learning performance using shape on the touchscreen would directly compare to and be interchangeable with reversal learning performance using colored tubes. However, it quickly became clear that the touchscreen experiment may have been asking a different question compared with the traditional reversal learning approach using physical objects. Unfortunately, we did not have the time to explore what might have caused the differences between the two tests, but we speculate below. We conclude that these two methods, the traditional physical object and the touchscreen, do not measure the same construct in this species and with this reversal learning experiment.
One possible explanation for the difference between the two experiments is that grackles might require more trials to learn to discriminate between shapes than between colors. Shapes are known to require a few more trials for a preference to develop (e.g., Shaw et al., 2015: mean=40 trials color, mean=55 trials shape in toutouwai; Isden et al., 2013: mean=6 trials color, mean=10 trials shape in spotted bowerbirds), however grackles required hundreds more trials to learn shapes, therefore this explanation seems unlikely. Moreover, grackles may not have understood how the touchscreen worked and therefore it was the apparatus that interfered with their performance, yet grackles successfully completed a go no-go inhibition task using the same touchscreen apparatus (Logan et al., 2021). The go no-go task similarly used two different white shapes (wavy lines or a heart), but the shapes were presented sequentially rather than simultaneously (as in the reversal touchscreen experiment). Given this difference between the two touchscreen experiments, it is possible that the grackles found touching the screen in the reversal experiment rewarding in and of itself because something happened whenever they made a response. That is, if they touched the correct stimulus, they received food; if they touched the incorrect stimulus, the screen went blank immediately. This is in contrast with the go no-go experiment where the stimulus stayed on the screen for a set amount of time after an incorrect choice. Another potential reason for the difference between performances on the two touchscreen experiments was that making the incorrect choice in the reversal experiment was not costly enough. In the reversal touchscreen experiment, they could get through many trials, receiving some rewards, in a short amount of time. Consequently, there was potentially not enough incentive to learn quickly, thus explaining the differences in learning speeds between the two reversal experiments.
We are not the first group to attempt to transfer a traditional lab or field task to a touchscreen apparatus (e.g., Drayton & Santos, 2014). Despite some of the challenges associated with touchscreen apparatuses, other attempts to transfer tasks to a touchscreen have been more successful (e.g., Blaisdell & Cook, 2005; Kangas & Bergman, 2017; Sawa et al., 2005). We maintain that touchscreens have the potential to be an incredibly useful tool for studying comparative cognition in some systems (for reviews and methods, see Bussey et al., 2008; Cook et al., 2004; Kangas & Bergman, 2017; Logan et al., 2021; Seitz et al., 2021; Wolf et al., 2014).
We demonstrate that it is possible to manipulate flexibility, using a paradigm such as reversal learning, to examine its direct link with other traits. This opens up many opportunities for future research to better understand what flexibility is and whether and how it is causally related to other behaviors or forms of cognition. Understanding how flexibility causally relates to other traits will allow researchers to develop robust theory about the mechanisms and functional impact of flexibility, and when to invoke it as a primary driver in a given context, such as a rapid geographic range expansion. Indeed, we are already in the process of testing the latter hypothesis by conducting cross-population research on great-tailed grackles to test whether a population on the range edge is more flexible (Logan CJ et al., 2020). That we were able to manipulate flexibility, which had causal effects on flexible behavior in a different context (multi-access box) as well as a different cognitive ability (innovativeness), demonstrates that flexibility manipulations could be useful in training individuals of other species in how to be more flexible. This could have important implications for threatened and endangered taxa (such as informing the choice of individuals for captive breeding or introduction programs where individuals or their offspring are released into novel areas), as well as for habituating zoo animals or other managed populations to novelty. If such a flexibility manipulation was successful, it could then change their behavior in this and other domains, giving them a better chance of succeeding in human modified environments. This is the focus of our new research program, ManyIndividuals, where we manipulate flexibility using serial reversals in the wild in species that are successful and at risk and determine whether the manipulation improves their success in human modified environments (Logan et al., 2022).
This research is carried out in accordance with permits from the:
This research is funded by the Department of Human Behavior, Ecology and Culture at the Max Planck Institute for Evolutionary Anthropology (2017-current), and by a Leverhulme Early Career Research Fellowship to Logan (2017-2018).
We, the authors, declare that we have no financial conflicts of interest with the content of this article. CJ Logan and D Lukas are Recommenders at PCI Ecology, and Logan used to be on the Managing Board (2018-2022).
We thank our PCI Ecology recommender, Aurelie Coulon, and reviewers, Maxime Dahirel and Andrea Griffin, for their feedback on the preregistration and post-study manuscript; Kevin Langergraber for serving as our ASU IACUC PI; Ben Trumble and Angela Bond for logistical support; Melissa Wilson for sponsoring our affiliations at Arizona State University and lending lab equipment; Kristine Johnson for technical advice on great-tailed grackles; Arizona State University School of Life Sciences Department Animal Care and Technologies for providing space for our aviaries and for their excellent support of our daily activities; Julia Cissewski for tirelessly solving problems involving financial transactions and contracts; Sophie Kaube for logistical support; Richard McElreath for project support; Aaron Blackwell and Ken Kosik for being the UCSB sponsors of the Cooperation Agreement with the Max Planck Institute for Evolutionary Anthropology; Tiana Lam, Anja Becker, and Brynna Hood for interobserver reliability video coding: Sawyer Lung for field support; Alexis Breen for coding multi-access box videos; and our research assistants: Aelin Mayer, Nancy Rodriguez, Brianna Thomas, Aldora Messinger, Elysia Mamola, Michael Guillen, Rita Barakat, Adriana Boderash, Olateju Ojekunle, August Sevchik, Justin Huynh, Jennifer Berens, Amanda Overholt, Michael Pickett, Sam Munoz, Sam Bowser, Emily Blackwell, Kaylee Delcid, Sofija Savic, Brynna Hood, Sierra Planck, and Elise Lange.
To begin to understand what kinds of effect sizes we will be able to detect given our sample size limitations and our interest in decreasing noise by attempting to measure it, which increases the number of explanatory variables, we used G*Power (v.3.1, Faul et al., 2007, 2009) to conduct power analyses based on confidence intervals. G*Power uses pre-set drop down menus and we chose the options that were as close to our analysis methods as possible (listed in each analysis below). Note that there were no explicit options for GLMs (though the chosen test in G*Power appears to align with GLMs) or GLMMs or for the inclusion of the number of trials per bird (which are generally large in our investigation), thus the power analyses are only an approximation of the kinds of effect sizes we can detect. We realize that these power analyses are not fully aligned with our study design and that these kinds of analyses are not appropriate for Bayesian statistics (e.g., our MCMCglmm below), however we weare unaware of better options at thatis time. Additionally, it is difficult to run power analyses because it is unclear what kinds of effect sizes we should expect due to the lack of data on this species for these experiments.
To address the power analysis issues, we ran simulations on our Arizona data set before conducting any analyses in this preregistration.
Planned: We will first run null models (i.e., dependent variable ~ 1 + random effects), which will allow us to determine what a weak versus a strong effect is for each model. Then we will run simulations based on the null model to explore the boundaries of influences (e.g., sample size) on our ability to detect effects of interest of varying strengths. If simulation results indicate that our Arizona sample size is not larger than the lower boundary, we will continue these experiments at the next field site until we meet the minimum suggested sample size.
To run the simulations, we first constructed a hypothesis-appropriate mathematical model that encompasseds the relationship between the variables of interest for each analysis: 1) number of loci solved on the multi-access box ~ trials to reverse, and 2) latency to attempt a new locus on the multi-access box ~ trials to reverse.
Simulation and model: number of loci solved on the multi-access box ~ trials to reverse
The model takes the form of:
locisolved ~ Binomial(4, p) [likelihood]
logit(p) ~ \(\alpha\)[batch] + \(\beta\)trials [model]
locisolved is the number of loci solved on the multi-access box, 4 is the total number of loci on the multi-access box, p is the probability of solving any one locus across the whole experiment, \(\alpha\) is the intercept and each batch gets its own, \(\beta\) is the expected amount of change in locisolved for every one unit change in trials, and trials is the number of trials to reverse a color preference.
Expected values for the number of loci solved on the multi-access box were set to either 2 or 0 (out of 4 loci maximum) because we were unsure of whether the grackles would be able to solve any loci on the multi-access box because this experiment had never been done on this species before. Expected values for reversal learning using colored tubes (mean, standard deviation, and range of number of trials to reverse a color preference) were based on previously published data on great-tailed grackles (Logan, 2016). This data indicates that the average number of trials to reverse a preference is 91 and the standard deviation is 21. In our model, the variation in the actual data is reflected by both the population standard deviation and the expected amount of change related to the explanatory variable. After running simulations, we identified the following distributions and priors to be the most likely for our expected data:
\(\alpha\) ~ Normal(4,10) [\(\alpha\) prior]
\(\beta\) ~ Normal(0,5) [\(\beta\) prior]
We used normal distributions for \(\alpha\) and \(\beta\) because they are (or are based on) sums with large means (see Figure 10.6 in McElreath, 2018). For the \(\beta\) prior, we had no expectation about whether the relationship would be positive or negative, therefore we centered it on 0 (the mean).
Simulation and model: latency to attempt a new locus on the multi-access box ~ trials to reverse
For the average latency to attempt a new locus on the multi-access box as it relates to trials to reverse (both are measures of flexibility), we simulated data and set the model as follows:
latency ~ gamma-Poisson(\(\lambda_i\), \(\phi\)) [likelihood]
log(\(\lambda_i\)) ~ \(\alpha\)[batch] + \(\beta\)trials [the model]
latency is the average latency to attempt a new locus on the multi-access box, \(\lambda_i\) is the rate (probability of attempting a locus in each second) per bird (and we take the log of it to make sure it is always positive; birds with a higher rate have a smaller latency), \(\phi\) is the dispersion of the rates across birds, \(\alpha\) is the intercept for the rate per batch, \(\beta\) is the expected amount of change in the rate of attempting to solve in any given second for every one unit change in trials, and trials is the number of trials to reverse a color preference.
Expected values for the latency to attempt a new locus on the multi-access box was set to between 1-2700 sec because the experiment ends for a bird if they do not obtain the food in 3 consecutive trials, and each trial can last up to 15 min. Because we did not have prior data for this species on this test, we set the mean to 300 sec, which is half way through a usual 10 min trial because it seems likely that if a bird is going to attempt another locus, it will likely do so at the next opportunity, especially after being successful in the previous trial. Expected values for reversal learning using colored tubes are the same as above. After running simulations, we identified the following to be the most likely distributions and priors for our expected data:
\(\phi\) ~ 1/(Exponential(1)) [\(\phi\) prior]
\(\alpha\) ~ Normal(300,50) [\(\alpha\) prior]
\(\beta\) ~ Normal(0,5) [\(\beta\) prior]
We used a gamma-Poisson distribution for latency because it constrains the values to be positive and to primarily occur sooner rather than later, which is what we expect from the grackles (based on data from New Caledonian crows and kea in Auersperg et al., 2011). For \(\phi\), we used an exponential distribution because it is standard for this paramter. We used normal distributions for \(\alpha\) and \(\beta\) because they are (or are based on) sums with large means (see Figure 10.6 in McElreath, 2018). For the \(\beta\) prior, we had no expectation about whether the relationship would be positive or negative, therefore we centered it on 0 (the mean).
We translated the simulation output into effect sizes and examined what kind of effect size these parameter values represent (Table SM1.1). For each \(\beta\), we calculated the effect size (Box 13.3 in Lajeunesse et al., 2013: linear regression):
r = \(\beta\) (SDx / SDy) = \(\beta\) (1.5 / 21)
Where r is the Pearson product moment correlation and SD is the standard deviation. For the standard deviation of x (number of loci solved on the multiacccess box), we estimated a possible value of 1.5. For the standard deviation of y (trials to reverse), we used 21 from the Santa Barbara grackle data (Logan, 2016). We then calculated the effect sizes and R2 values for each value of \(\beta\).
Table SM1.1. The connection between \(\beta\) and effect sizes (SDx=standard deviation of x, which is the number of loci solved; SDy=standard deviation of y, which is the number of trials to reverse; R2=R squared).
| Beta | SDx | SDy | Effect size | R2 |
|---|---|---|---|---|
| -5 | 1.5 | 21 | -0.357 | 0.128 |
| -1 | 1.5 | 21 | -0.071 | 0.005 |
| 0 | 1.5 | 21 | 0.000 | 0.000 |
We then used the simulations to run models on simulated data to estimate the measurement error associated with varying sample size, \(\beta\), and the range of multi-access box loci solved or latency to attempt a new locus (Table SM1.2). Before running the models, we decided that a model would detect an effect if 89% of the posterior sample was on the same side of zero (following McElreath, 2018). We ran the simulation with \(\beta\)=3 (latency) because this was a high value at which an appropriate range of values were observed in the simulation testing phase, \(\beta\)=0 because this would be the scenario in which there is no relationship between the response variable and the trials to reverse, and \(\beta\)=-1 to determine how small of a difference we can detect and with what amount of associated noise (\(\sigma\)). Sigma (\(\sigma\)) is the standard deviation in the trials to reverse if the trials to reverse is a normal distribution. In all simulations, the mean in the trials to reverse was set to 91. Therefore, a (\(\sigma\)) of 14 is 15% noise (14/91). We found that when (\(\sigma\)) is larger than 14, we cannot detect even the largest effect of trials to reverse on loci solved or latency because there are some simulations where the estimated regression coefficient crosses zero. When \(\beta\)=0 we want all of the regression coefficients to cross zero (10 out of 10 random repetitions) and when \(\beta\) \(\neq\) 0 we want none of the regression coefficients to cross zero (0 out of 10 random repetitions). We ran the models several times with various parameters to determine at what point this was the case for each combination of parameters.
Table SM1.2. Simulation outputs from varying \(\beta\), sample size (n), \(\sigma\), and whether the actual range of multi-access box [MAB] loci solved were 0-2 or 0-4 (we did not know how many loci the grackles would be able to solve before we started collecting data so we ran two simulations. The grackles ended up being able to solve all four loci on both multi-access boxes, therefore we must use only those rows associated with “Range of MAB loci solved” = 0-4). This table is useful for the analyses involving the number of loci solved on the multi-access box, but not the latency to switch to attempting a new locus on the multi-access box, which uses a different (gamma poisson) model.
| Beta | n | Sigma | Regression coefficient crosses zero | Regression coefficient | Range of MAB loci solved |
|---|---|---|---|---|---|
| -5 | 15 | 15 | 1/10 | -5.90 | 0-4 |
| -5 | 15 | 14 | 0/10 | -5.11 | 0-4 |
| -5 | 15 | 12 | 0/10 | -4.79 | 0-4 |
| -5 | 15 | 10 | 0/10 | -4.31 | 0-4 |
| -5 | 10 | 10 | 1/10 | -4.35 | 0-4 |
| -5 | 10 | 9 | 0/10 | -5.26 | 0-4 |
| -5 | 8 | 10 | 1/10 | -5.35 | 0-4 |
| -5 | 8 | 9 | 0/10 | -4.22 | 0-4 |
| -5 | 8 | 8 | 0/10 | -3.08 | 0-4 |
| -5 | 8 | 8 | 1/10 | -4.74 | 0-2 |
| -5 | 8 | 7 | 3/10 | -6.74 | 0-2 |
| -5 | 8 | 5 | 0/10 | -3.08 | 0-2 |
| -5 | 10 | 9 | 3/10 | -4.51 | 0-2 |
| -5 | 10 | 7 | 1/10 | -7.67 | 0-2 |
| -5 | 10 | 6 | 2/10 | -5.16 | 0-2 |
| -5 | 10 | 5 | 1/10 | -4.57 | 0-2 |
| -5 | 10 | 4 | 0/10 | -5.02 | 0-2 |
| -5 | 15 | 14 | 2/10 | -3.07 | 0-2 |
| -5 | 15 | 13 | 5/10 | 1.68 | 0-2 |
| -5 | 15 | 10 | 5/10 | -8.20 | 0-2 |
| -5 | 15 | 8 | 3/10 | -4.01 | 0-2 |
| -5 | 15 | 6 | 0/10 | -6.03 | 0-2 |
| -5 | 15 | 7 | 1/10 | -8.06 | 0-2 |
| 0 | 15 | 14 | 10/10 | -3.23 | 0-2 |
| 0 | 15 | 14 | 10/10 | 0.43 | 0-4 |
| -1 | 15 | 14 | 10/10 | -1.53 | 0-4 |
| -1 | 15 | 10 | 10/10 | -0.73 | 0-4 |
| -1 | 15 | 5 | 3/10 | 0.19 | 0-4 |
| -1 | 15 | 3 | 1/10 | 0.18 | 0-4 |
| -1 | 15 | 2 | 0/10 | -1.07 | 0-4 |
| -1 | 15 | 2 | 3/10 | -1.67 | 0-2 |
| -1 | 15 | 1 | 1/10 | -1.12 | 0-2 |
This shows that we would have the power to detect a medium effect (-0.357 in Table M1) with a sample size of 15 if the noise (\(\sigma\)) is <15%. We would be unlikely to get a false negative because there were no false negatives in the simulations (i.e., the posterior sample range did not cross zero). With this sample size, when \(\beta\)=0, there are no false positives (i.e., the posterior sample range always included zero). However, we would not be able to detect a weak effect unless the noise (\(\sigma\)) was much smaller.
To determine whether experimenters coded the dependent variables in a repeatable way, hypothesis-blind video coders were first trained in video coding the dependent variable, and then they coded at least 20% of the videos in the reversal (tubes) and multi-access box experiments. We randomly chose a subset of all of the birds who participated in each experiment using random.org:
Reversal 6/20 grackles (30% with half from the control group): Chalupa, Avocada, Diablo, Fideo, Tomatillo, Adobo
Multi-access box plastic 3/15 grackles (20%): Habanero, Queso, Chalupa
Multi-access box log 3/12 grackles (25%): Diablo, Adobo, Yuca
Video coders then analyzed all videos from these birds. The experimenter’s data was compared with the video coder data using the intra-class correlation coefficient (ICC) to determine the degree of bias in the regression slope (Hutcheon et al. (2010), using the irr package in R: Gamer et al. (2012)). Note that the data in columns from coders 1 and 2 in the data sheets were aligned based on similar numbers between coders to prevent disagreements near the top of the data sheet from misaligning all subsequent entries.
To pass interobserver reliability (IOR) training, video coders needed an ICC score of 0.90 or greater to ensure the instructions were clear and that there was a high degree of agreement across coders (see R code comments for details).
Alexis Breen (compared with experimenter’s live coding):
Multi-access box: correct choice unweighted Cohen’s Kappa=0.90 (confidence boundaries=0.77-1.00, n=33 data points)
Multi-access box: locus solved unweighted Cohen’s Kappa=0.90 (confidence boundaries=0.76-1.00, n=33 data points)
Note: Breen was not a hypothesis-blind video coder. She contributed to extensive video coding across the whole project, however, for interobserver reliability analyses, her data were always compared with a hypothesis-blind coder’s data.
Anja Becker (compared with experimenter’s live coding):
Tiana Lam (compared with experimenter’s live coding):
Multi-access box: correct choice ICC=0.90 (confidence boundaries=0.77-1.00, n=33 data points)
Multi-access box: locus solved unweighted Cohen’s Kappa=0.95 (confidence boundaries=0.84-1.00, n=33 data points)
Brynna Hood (compared with experimenter’s live coding):
Multi-access log: correct choice unweighted Cohen’s Kappa=1.00 (confidence boundaries=1.00-1.00, n=29 data points)
Multi-access log: locus solved unweighted Cohen’s Kappa=1.00 (confidence boundaries=1.00-1.00, n=29 data points)
Interobserver reliability scores (minimum 20% of the videos) were as follows:
Brynna Hood (compared with experimenter’s live coding):
Multi-access log: correct choice unweighted Cohen’s Kappa=0.91 (confidence boundaries=0.76-1.00, n=39 data points)
Multi-access log: locus solved unweighted Cohen’s Kappa=1.0 (confidence boundaries=1.0-1.00, n=39 data points)
Tiana Lam (compared with experimenter’s live coding):
Multi-access box: correct choice unweighted Cohen’s Kappa=0.83 (confidence boundaries=0.73-0.92, n=102 data points)
Multi-access box: locus solved unweighted Cohen’s Kappa=0.90 (confidence boundaries=0.830-0.97, n=102 data points)
Anja Becker (compared with experimenter’s live coding):
These scores indicate that the dependent variables are repeatable to a high or extremely high degree given our instructions and training
Table SM3. Model outputs for the number of loci solved and the latency to switch loci after passing criterion on a different locus on the plastic (models 1-5 and 9-11) and wooden (models 6-8 and 12-14) multi-access boxes. SD=standard deviation, the 89% prediction intervals are shown, n_eff=effective sample size, Rhat4=an indicator of model convergence (1.00 is ideal), b=the slope of the relationship between loci solved or average switch latency and the number of trials to pass the reversal.
| Mean | SD | Lower 89 percentile compatability interval (5.5%) | Upper 89 percentie compatability interval (94.5%) | n_eff | Rhat4 | |
|---|---|---|---|---|---|---|
| MODEL 1 (last reversal): loci solved plastic ~ a[batch] + b*trials | ||||||
| a[1] | 0.04 | 0.46 | -0.70 | 0.78 | 2304 | 1.00 |
| a[2] | 0.29 | 0.36 | -0.30 | 0.87 | 2456 | 1.00 |
| a[3] | -0.78 | 0.55 | -1.65 | 0.08 | 2510 | 1.00 |
| b | -0.22 | 0.25 | -0.63 | 0.18 | 2364 | 1.00 |
| MODEL 2 (last reversal): loci solved plastic ~ a + b*trials | ||||||
| a | -0.02 | 0.24 | -0.40 | 0.35 | 1466 | 1.00 |
| b | -0.46 | 0.31 | -0.97 | -0.01 | 1383 | 1.00 |
| MODEL 3 (last reversal): trials ~ a[batch] | ||||||
| a[1] | 0.09 | 0.37 | -0.48 | 0.69 | 2095 | 1.00 |
| a[2] | -0.21 | 0.29 | -0.68 | 0.25 | 1715 | 1.00 |
| a[3] | 0.25 | 0.39 | -0.38 | 0.86 | 2161 | 1.00 |
| sigma | 1.03 | 0.21 | 0.75 | 1.39 | 2049 | 1.00 |
| MODEL 4: loci solved ~ a[condition] | ||||||
| a[1] control | -0.11 | 0.32 | -0.62 | 0.40 | 1311 | 1.00 |
| a[2] manipulated | 0.15 | 0.39 | -0.46 | 0.80 | 1222 | 1.00 |
| MODEL 5 (first reversal): loci solved plastic ~ a + b*trials | ||||||
| a | 0.00 | 0.24 | -0.37 | 0.39 | 1208 | 1.00 |
| b | -0.44 | 0.30 | -0.94 | 0.02 | 1273 | 1.00 |
| MODEL 6 (last reversal): loci solved wooden ~ a + b*trials | ||||||
| a | 1.06 | 0.27 | 0.63 | 1.50 | 1255 | 1.00 |
| b | 0.41 | 0.43 | -0.21 | 1.13 | 1107 | 1.00 |
| MODEL 7: loci solved ~ a[condition] | ||||||
| a[1] control | -0.45 | 0.40 | -1.10 | 0.18 | 1161 | 1.00 |
| a[2] manipulated | 0.77 | 0.41 | 0.13 | 1.44 | 1302 | 1.00 |
| MODEL 8 (first reversal): loci solved wooden ~ a + b*trials | ||||||
| a | 0.11 | 0.26 | -0.30 | 0.52 | 1221 | 1.00 |
| b | -0.50 | 0.35 | -1.09 | 0.04 | 1234 | 1.00 |
| MODEL 9 (last reversal): avg switch latency plastic ~ a + b*trials | ||||||
| a | 4.93 | 0.30 | 4.45 | 5.41 | 1235 | 1.01 |
| b | 0.46 | 0.29 | 0.00 | 0.92 | 1363 | 1.00 |
| phi | 0.93 | 0.35 | 0.44 | 1.55 | 1476 | 1.00 |
| MODEL 10: avg switch latency plastic ~ a[condition] | ||||||
| a[1] manipulated | 4.07 | 0.39 | 3.46 | 4.68 | 1027 | 1.00 |
| a[2] control | 5.18 | 0.39 | 4.50 | 5.76 | 1006 | 1.00 |
| phi | 0.91 | 0.41 | 0.37 | 1.63 | 925 | 1.01 |
| MODEL 11 (first reversal): avg switch latency plastic ~ a + b*trials | ||||||
| a | 4.93 | 0.29 | 4.46 | 5.39 | 1488 | 1.00 |
| b | 0.46 | 0.28 | 0.02 | 0.93 | 1211 | 1.00 |
| phi | 0.94 | 0.36 | 0.44 | 1.60 | 1447 | 1.00 |
| MODEL 12 (last reversal): avg switch latency wooden ~ a + b*trials | ||||||
| a | 5.75 | 0.28 | 5.28 | 6.18 | 1049 | 1.00 |
| b | -0.41 | 0.32 | -0.86 | 0.15 | 1281 | 1.01 |
| phi | 1.04 | 0.42 | 0.48 | 1.77 | 1456 | 1.00 |
| MODEL 13: avg switch latency wooden ~ a[condition] | ||||||
| a[1] control | 5.31 | 0.42 | 4.61 | 5.95 | 701 | 1.00 |
| a[2] manipulated | 5.34 | 0.44 | 4.61 | 6.00 | 620 | 1.01 |
| phi | 0.66 | 0.32 | 0.25 | 1.25 | 806 | 1.00 |
| MODEL 14 (first reversal): avg switch latency wooden ~ a + b*trials | ||||||
| a | 5.71 | 0.26 | 5.28 | 6.12 | 1109 | 1.00 |
| b | -0.50 | 0.28 | -0.89 | -0.01 | 1308 | 1.00 |
| phi | 1.08 | 0.41 | 0.53 | 1.80 | 1347 | 1.00 |
In the tube experiment, it took four grackles an average of 40 trials (sd=12) in the initial discrimination phase to learn to prefer a color, while it took the same individuals an average of 390 trials (sd=59) to learn to prefer a shape using the touchscreen (Queso, Mole, Habanero, and Tapa). The two individuals who were faster to learn in the tube experiment were slower to learn in the touchscreen experiment. For the reversal, it took three of these individuals (Queso, Mole, and Habanero) an average of 80 trials (sd=14) to reverse their colored tube preference, and an average of 362 trials (sd=111) to reverse their shape preference on the touchscreen (Tapa had to be released back to the wild before finishing the experiment, but was on trial 629 in reversal one of the touchscreen experiment at the time of release. In the tube experiment, she was also the slowest of the four to reverse at 100 trials). All three individuals were about equally fast at the reversal in the tube experiment, while their reversal learning speeds differed on the touchscreen. The touchscreen training data and a summary of the training process is detailed in Seitz et al. (2021).
Table SM5. Summarized results per bird in the reversal learning (tube and touchscreen) and multi-access box (plastic and wooden) experiments. “Reversals to pass” indicates how many serial reversals it took a bird to pass criterion (passing two consecutive reversals in 50 trials or less) if they were in the flexibility manipulation condition. X indicates the bird attempted, but did not pass that experiment. Note: Tapa did not finish the MAB log experiment; Marisco’s MAB log experiment ended too early due to experimenter error (timed out on 2 consecutive sessions, not 3); Mole and Habanero: do not count MAB plastic number of options solved because they were given the box fully put together for habituation due to experimenter error; Taco was the first juvenile we tested and we did not put him in the flexibility experiment: he received 1 reversal and moved on to his next test, therefore he was essentially a control bird without the matched yellow tube experience.
| Bird | Batch | Sex | Trials to learn (tube) | Trials to first reversal (tube) | Trials to last reversal (tube) | Reversals to pass | Total loci solved (MAB plastic) | Total loci solved (MAB wooden) | Average latency to attempt new locus (MAB plastic) | Average latency to attempt new locus (MAB wooden) | Trials to learn (touchscreen) | Trials to first reversal (touchscreen) | Motor actions (MAB plastic) | Motor actions (MAB wooden) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tomatillo | 1 | M | 40 | 50 | 50 | Control | 3 | 317 | 13 | |||||
| Queso | 1 | M | 50 | 70 | 70 | Control | 1 | 88 | 330 | 460 | 8 | |||
| Tapa | 1 | F | 30 | 100 | 100 | Control | 4 | 685 | 450 | (629+) | 12 | |||
| Yuca | 3 | F | 40 | 80 | 80 | Control | 4 | 4 | 132 | 77 | 13 | 16 | ||
| Marisco | 3 | M | 40 | 50 | 50 | Control | 1 | 2 | 208 | 3 | 7 | |||
| Pizza | 3 | M | 50 | 60 | 60 | Control | 0 | 1 | 1482 | 0 | 8 | |||
| Mofongo | 4 | M | 20 | 40 | 40 | Control | 3 | 4 | 502 | 630 | 13 | 14 | ||
| Taquito | 4 | M | 90 | 160 | 160 | Control | 0 | 4 | 100 | 11 | 10 | |||
| Chalupa | 1 | F | 50 | 90 | 50 | 8 | 0 | 6 | ||||||
| Mole | 1 | M | 30 | 70 | 50 | 7 | 4 | 4 | 356 | 1173 | 431 | 307 | 14 | 15 |
| Habanero | 1 | M | 50 | 80 | 40 | 6 | 4 | 28 | 350 | 290 | 15 | |||
| Diablo | 3 | M | 20 | 80 | 40 | 8 | 2 | 1 | 25 | 10 | 2 | |||
| Burrito | 3 | M | 40 | 60 | 23 | 8 | 3 | 4 | 76 | 391 | 17 | 18 | ||
| Adobo | 3 | M | 50 | 100 | 50 | 6 | 4 | 4 | 31 | 79 | 16 | 18 | ||
| Chilaquile | 3 | JM | 30 | 40 | 30 | 6 | 4 | 4 | 44 | 170 | 19 | 11 | ||
| Pollito | 4 | M | 40 | 60 | 40 | 8 | 0 | 3 | 668 | 0 | 11 | |||
| Taco | 3a | JM | 50 | 80 | 80 | (Control) | 1 | 4 | 117 | 2 | 19 | |||
| Memela | 1 | F | 50 | 60 | 80 | X (11+) | ||||||||
| Fideo | 2 | M | 60 | 70 | 70 | Control | ||||||||
| Avocada | 1 | F | 50 | 100 | 100 | Control | ||||||||
| Huachinago | 3 | M | 70 | Control | ||||||||||
| Guacamole | 4 | M | 30 |
Below are figures for the proportion of trials correct by trial number and reversal for each bird.
Figure SM6.1. Adobo’s proportion of trials correct by trial number and reversal.
Figure SM6.2. Chalupa’s proportion of trials correct by trial number and reversal.
Figure SM6.3. Chilaquile’s proportion of trials correct by trial number and reversal.
Figure SM6.4. Diablo’s proportion of trials correct by trial number and reversal.
Figure SM6.5. Habanero’s proportion of trials correct by trial number and reversal.
Figure SM6.6. Memela’s proportion of trials correct by trial number and reversal.
Figure SM6.7. Mole’s proportion of trials correct by trial number and reversal.
Figure SM6.8. Pollito’s proportion of trials correct by trial number and reversal.